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1.  INTRODUCTION 

There  have  been  a  number  of  different  approaches  in  the  scientific 
literature  to  the  frequency  analysis  of  tlse  aeries,  aoat  notably  the 
perlodograa  technique  of  Schuster  (1906)  and  the  maximum  entropy  method 
of  Burg  (1968).  The  perlodograa  approach  was  substantially  improved 
when  the  fast  Fourier  transform  algorithms  became  widely  available 
(Good  1958;  Cooley  and  Tukey  1965)  as  a  means  of  speeding  up  the  com¬ 
putations;  this,  plus  che  variety  of  smoothing  algorithms  proposed  for 
improving  the  statistical  properties  of  the  perlodograa  (Danlell  1946; 
Blackman  and  Tukey  1958;  Parzen  1961;  Cogbum  and  Davis  1974;  Uahba  1979), 
and  the  extension  of  these  ideas  to  higher-order  spectral  estimation 
(Brllllnger  and  Rosenblatt  1967),  have  together  ensured  the  popularity, 
at  least  in  the  statistical  literature,  of  perlodogram-based  techniques 
for  spectrum  estimation  (Koopaans  1974;  Brllllnger  1975;  Bloomfield  1976). 

Concern,  however,  over  the  relatively  low  resolution  properties  of 
the  periodogram  (and  also  of  its  smoothed  versions)  prompted  the  geophysics, 
astronomy,  and  engineering  fields  to  turn  towards  the  higher  resolution 
maximum  entropy  method  of  spectrum  estimation  (Ulrych  and  Bishop  1975; 

Kirk  et  al  1979),  especially  when  it  came  to  the  analysis  of  short  data 
records.  For  example,  in  Veils  and  Chinn* ry  (1973),  the  maximum  entropy 
method  was  used  to  separate  the  Chandler  spectral  component  at  approxi¬ 
mately  0.83  cycles  per  year  (cpy)  from  the  annual  spectral  component  at 
1.0  cpy  in  short  records  of  astronomical  latitude  and  polar  motion  data. 

In  Bolt  and  Currie  (1975),  the  maximum  entropy  method  was  shown  to  be 
superior  to  the  perlodograa  in  terms  of  enhanced  precision  and  number  of 
torsional  elgenperiods  detected  from  data  recorded  at  Trieste  following 
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the  1960  Chilean  earthquake.  The  aarlani  entropy  Method  was  also  used  In 
Jensen  and  Ulrych  (1973)  to  analyse  the  perturbation  notions  of  Barnard's 
Star  and  provide  detailed  information  concerning  the  nunber  and  orbital 
periods  of  its  unseen  companion  planets. 

The  equivalence  of  the  naxlnun  entropy  aethod  with  the  least-squares 
fitting  of  very  high  order  autoregressive  models  was  pointed  out  by  Lacoss 
(1971)  and  van  den  Bos  (1971) ,  and  its  relationship  to  the  marl nun 
likelihood  spectrum  estimator  by  Burg  (1972).  The  paper  of  Bark  (1974)  is 
relevant  here.  Critics  of  the  naxlnun  entropy  method  have  pointed  to  Its 
sensitivity  to  the  selection  of  the  order  of  the  autoregressive  process , 
the  computational  complexity  and  programming  expaaea  of  the  method,  its 
lack  of  a  variance  estimate  for  the  resulting  spectral  density  estimator, 
interpretation  problems  (since  it  is  not  clear  what  the  relationship  is 
between  the  "power"  of  the  naxlnun  entropy  spectral  density  and  the  true 
amplitude  of  the  spectral  conponent  at  a  particular  frequency),  and  the 
difficulty  of  extending  it  to  the  frequency  analysis  of  nultlple  series. 

Ve  refer  the  interested  reader  to  the  papers  by  Ulrych  (1972),  Akaike 
(1969),  Herring  (1980),  and  to  the  references  therein. 

With  these  consents  in  mind,  ve  present  a  different  approach  to  the 
traditional  methods  of  frequency  analysis  of  time  series.  Unlike  the 
perlodogram  technique  and  the  entropy  method,  we  do  not  attempt  to 

estimate  the  power  spectrum  of  a  series,  rather,  we  direct  attention  towards 
the  exploratory  screening,  identification,  and  resolution  of  any  significant 
frequencies  that  night  be  present  in  a  time  series.  Tor  a  discussion  of  the 
resolution  problem,  see,  for  example,  Jenkins  and  Watte  (1968,  pp.  277-279), 
Brilllnger  (1975,  p.  69),  and  Bloomfield  (1976,  pp.  96  and  172). 
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To  accomplish  this  objective,  we  consider  (in  Section  2)  a  quantity 
termed  the  "amplitude  density  function"  which  was  introduced  in  Siddlqul 
and  Izenman  (1981)  adapted  ffroa  an  idea  of  Paul  (1972),  and  vhich  la  derived 
from  the  spectral  representation  of  the  sum  of  an  harmonic  regression  func¬ 
tion  and  a  stochastic  error  process,  and  on  the  inversion  theorem  associated 
with  that  representation.  The  amplitude  density  function  as  defined  in 
Section  2  possesses  the  twin  desirable  properties  of  mean  square  consistency 
and  high  frequency  resolution,  is  related  to  the  finite  Fourier  transform 

of  a  tapered  time  series,  and  as  such,  can  be  computed  using  fast  Fourier 
transform  methods. 

After  using  the  amplitude  density  function  to  identify  a  (possibly,  large) 
number  of  prominent  frequencies  In  a  series,  the  next  step  la  to  select  a 
subset  of  those  frequencies  as  Input  to  a  "hidden  periodicities”  regression 
model  (see  Section  3).  There  are  basically  two  methods  In  the  statistical 
literature  for  testing  the  significance  of  suspected  periodicities,  and 
both  restrict  themselves  to  the  perlodogram  situation;  that  is,  where  the 
frequencies  to  be  tested  are  Fourier  frequencies,  or.  Integer  multiples  of 
T  \  and  orthogonality  relations  simplify  the  analysis.  The  methods  are 
those  due  to  Fisher  (1929,  1940)  and  to  Hartley  (1949).  See  also  Siegel 
(1980).  Little  attempt  has  been  made  in  the  statistical  literature, 
however,  towards  solving  the  more  general  problem  of  testing  arbitrary 
non-Fourier  frequencies  for  significance,  which  is  the  case  when  periods 
are  not  Integral  divisors  of  the  series  length.  Some  work  In  this  direc¬ 
tion  can  be  found  in  Section  4.4  of  Anderson  (1971).  In  Section  3  of  this 
paper,  we  discuss  the  use  of  straightforward  generalisations  (to  the  non- 
Fourier  frequencies  case)  of  the  Fisher  and  Hartley  tests. 


V*  also  introduce  in  this  paper  (aaa  Section  4.3)  a  higb-r  esolut ion 
"frequency  trace"  to  investigate  the  extent  and  direction  of  poeaible 
nonatat loner ity  in  a  tine  series.  All  the  techniques  described  in  chi» 
paper  are  llluatrated  in  Section  4  by  an  extremely  detailed  frequency 
analyala  of  that  perennial  favorite,  the  annual  aunapot  nuabera  aeries 
for  the  period  1749-1979.  Another  novel  feature  preaented  in  Section  4 
la  the  uae  of  a  physically  motivated  data  transformation  of  the  aunapot 
aeriea  Which  dranatieally  inprovea  the  performance  of  the  model  deacrlbed 


in  Section  2 
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2.  SPECTRAL  REPRESENTATION.  THE  I5VERSI0W  THEOREM,  AMP 
THE  AMPLITUDE  DENSITY  FUNCTION 

Let  X(t),  t-0,±l,±2»...,  denote  e  tine  series  observed  et  equlspaced 
tine  intervals,  where  the  origin  and  unit  of  tine  ere  chosen  arbitrarily. 
Consider  the  following  stochastic  nodal: 

X(t)  -  u(t)  +  e(t).  t-0,±l,±2,.. .,  (2.1) 

where 

V(t)  *  °0  +  Sj.jtOjCOsCfrfjt)  +  BjSin(2vfjt)}  (2.2) 

is  the  regression  function,  u(t)  “  E{X(t)>,  the  frequencies,  f^, 
j-l,2,...,k,  as  well  as  their  number,  k,  the  long-term  average  a-,  and 
the  coefficients  (a^,  8^) ,  J-l,2,...,k,  are  all  assuned  to  be  unknown 
constants,  and  the  error  series,  e(t),  t*"0,±l,±2,.. . ,  is  a  strictly 
stationary  process  with  first  two  nonents 

E{e(t)}  -  0  and  cov{e(t),  «(frhi)}  •  c(u),  t,u-0,±l,±2,...,  (2.3) 

where 

£**  1c(u)I  <  •.  (2.*) 


tinder  condition  (2.4),  the  second-order  spectral  density  function 

«,«)  -  C_«2’1'V»> .  -%  <  *  2  %  .  <J  » 

of  the  error  series  is  bounded  and  unlfonly  continuous.  While  Oq  and 
the  coefficients  (a^,  Bj),  J-l,2,...,k,  snter  the  model  (2.1)-(2.2)  in  s 


linear  fashion,  the  frequencies  f ^ ,  J-1,2 . k,  (and  k)  enter  in  a  non¬ 

linear  way.  Thus,  if  k  and  the  f  requeue  lea  f^,  j-l,2,...,k,  were 
known  1  priori,  the  coefficients  could  be  estimated  by  standard  regression 
techniques. 


2.1.  The  Amplitude  Density  Function 

For  convenience,  replace  the  trigonoaetrlc  functions  In  (2.2)  by 

sin  6  -  ~(ei8-e"18)  and  cos  6  -  ±(e18+e"18). 


and  write  Yj  “  ® j**i  Bj  •  Denote  also  by  Yj  the  complex  congugate  of  Yj ,  so 

e  mm  e  e 

that  1yj|  “■  YjYj  “  o j  +  Bj  is  the  squared  amplitude  of  the  complex  number 
Yj.  Then,  (2.2)  becomes 


V  (O  -  a0  +  iJ.jtYj*2^*  +  Yj«“2Tlf Jfc) 


-  /  e2’lftdA(f), 
-^1 


(2.6) 


where  the  function  A^ (f)  is  defined  in  terms  of  its  differential  Increments, 
namely. 


d^(0)-o0,  d^(fj)  -  Yj/2,  dA^-fj)  -  7j/2,  j-l,2,...,k,  (2.7) 

and  dA^(f)  -  0  for  all  other  f  in  [ -^J,  %].  The  Four ier-Stlelt J es  Integral 
oo  the  right-hand-side  of  (2.6)  is  the  spectral  representation  of  the 
regression  function  u(t)  In  (2.2),  and  the  function  ^(f),  J|  <  f  ^ij,  in 
(2.6)  is  of  bounded  variation. 

Furthermore,  by  a  theorem  of  Cramdr  (1942),  the  error  series  in  (2.1) 
has  a  spectral  representation  given  by 


H  ; 

«(t)  -  /  e 


(2.8) 


2rift 


dAg(f), 


where  A((f)  is  a  complex-valued  stochastic  process  having  orthogonal 
increments;  that  is,  E{dAg(£)}  “  0,  and 

E{dAc(f)dAe(f)>  -  «{f-f,}ge(£)d£,  -J*  <  f  <  (2.9) 


where  S{a}  is  the  Kronecker  delta  (4{a}«l  i£  o*0,  and  is  0  otherwise). 

Combining  (2.6)  end  (2.8),  the  spectral  representation  of  the  X(t)- 
process  Is,  therefore,  given  by 


X(t)  - 


j£  e2*lft  dA^f), 


(2.10) 


where  A^f)  "  A^(f)  +  A£(f).  In  this  representation,  Aj(f)  has  jumps  at 
£  -  ±fj,  J«l,2, ... ,k,  and  at  £  ■  0,  and  is  stochastically  continuous  at 
all  other  points.  We  shall  henceforth  call  the  process  A^(f) ,  Jj  <  {  <  >j, 
in  (2.10)  the  (complex-valued)  amplitude  process  corresponding  to  X(t), 
t-0,±l,±2,.. or,  just  the  amplitude  process  if  the  series  la  understood. 

Consider  now  a  partition  of  the  frequency  band  (0,  %)  Into  a  number 
of  nonoverlapping  subbands  of  equal  length  Af  >  0,  and  let  f  be  an 
arbitrary  frequency  in  (0,  %)  such  that 


0  <  f-**Af  <  f-t*sAf  <  >j. 


(2.11) 


Define 


AA^f )  -  A^(f4*jAf)  -  A^fJjAf)  -  A^  (f)  +  AAg(f) 


(2.12) 


to  b«  the  Inc: 


it  of  the  aaplitude  process  in  the  frequency  subband 


(fJjif,  f+*j&f)  of  length  Af  around  f.  Now,  if  we  shrink  the  value  of  Af 
sufficiently,  it  will  eventually  become  smaller  than  aln^  <  ^  ^  ^Cjf^f^J), 
For  such  a  Af,  at  most  one  of  the  k  frequeneles  in  the  model  (2.2) 
will  fall  into  an  Interval  of  length  Af  on  the  frequency  axis.  It  follows 


that,  for  Af  small  enough. 


!0,  if  no  f .  is  in  (f-JjAf, 
!  3 

y  Yj,  if  there  is  an  f^  in 


f^sAf) 


(f-J*Af,  f+Js&f) 


...  ,k. 


(2.13) 


AAe(f)  -  gc(f*)df. 


(2.14) 


where  f*  is  in  (f-fjAf,  f4*jAf).  Thus,  as  Af  4  0, 


AAx(f)  ♦  0  and  AAx(f)/Af  ♦  g#(f),  if  f  *  fjt  J-1,2 . k,  (2.15) 


AAx(f)  ♦Iyj  «od  AAx(f)/Af  +  if  f  -  ty  j-l,2,...,k.  (2.16) 


The  following  Inversion  theorem  allows  us  to  express  AA^(f)  in  terms  of 
the  series  X(t),  t-0,±l,±2,.. .  . 


Theorem  1.  If  f  ±  %6f  are  continuity  points  of  A(|(f),  and  hence  also  of 
A^(f)»  then 

AAj^f)  -  l.i.m.^  — jp-)  t”2irift  x(t),  ->s  <  f  <  J|,  (2.17) 


Proof.  See  Doob  (1953),  Theorem  4.1  of  Chapter  X,  or  Hannan  (1970), 
Theorem  3"  of  Chapter  11. 


This  theorem,  therefore,  provides  us  with  a  natural  estimator  of  AA^(f) . 
Let  T  be  the  number  of  data  values  In  the  time  series,  and  let  H  be  such 
that  2N+1  <  T,  but  of  the  order  of  T.  Take  the  time-origin  to  be  the 
(gfl)st  value,  and  write 

AA<T)(f)  -  e‘2rlft  x(t)f  J,  <  f  <  |J.  (2.18) 

Hote  that,  for  real-valued  X(t), 

AA*T)(f)  -  AA*T)(-f)  and  AA<T>(f+l)  -  AA<T)(f), 


so  that,  without  loss  of  generality,  ve  may  take  the  principal  domain  of 

the  estimator  (2.18)  to  be  the  frequency  band  0  «  f  <  %. 

There  are  alternative  ways  of  visualizing  AA£  (f).  The  finite 

Fourier  transform  of  the  series  X(t),  t*0,il,±2, . . . ,±H,  Is  given  by 

E*  e“2*if t  X(tj^  o  <  f  <  4  ,  which,  when  integrated  between  the  fre- 
t*— N 

quencies  f-*jAf  and  f+%Af »  equals 


f+»jAf 

/  (  e”_h  e‘2rift  X(t)  ]  df 

f-'jAf  f+ijAf 

-  C  /  e‘2wlft  df  ]  x(t) 

f-JjAf 

-  Z’t-M  ik  (•iwtA£-"i*tAf)  •‘2,rlft  X(t) 

-  AA^f), 

so  that,  if  Af  is  taken  to  be  small  (relative  to  1/T),  then  in 

(2.18)  will  be  essentially  proportional  to  the  finite  Fourier  transform 
of  X(t)  at  frequency  f  (0  <  f  <  >*).  The  factor  h(t)  ■  sin(*tAf)/*t  In  (2.18) 
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has  maximum  value  Af  (at  t-0) ,  and  will  be  very  close  to  Af  for  all 

_4 

other  values  of  t.  Zn  fact,  if  T  -  231  and  Af  -  10  (see  Section  4), 

_4 

then  the  Minimum  value  of  h(t)  is  0.9998  x  10  (at  t*±U5),  and  the 
relative  variation  of  that  function  is,  therefore,  (niinriniun-ininlBnnn)/Af 
<10  ,  or,  0.1  percent.  Paul  (1972)  had  a  similar  idea  of  using  the 
integral  of  the  finite  Fourier  transform  of  X(t) ,  but  did  little  more 
than  dwell  at  length  on  the  inversion  formula  (2.17). 

Furthermore,  for  a  given  value  of  Af,  (2.18)  may  be  regarded  as 
the  finite  Fourier  transform  of  the  "tapered"  series  Y(t)  ■  h(t)X(t) , 
t-0,±l,±2,...,±N,  where  (Af)~^h(t)  is  related  to  the  Riemann-Lanczos 
convergence  factor  for  Fourier  transforms;  see  Lanczos  (1956,  Chapter  IV) 
and  Brillinger  (1975,  Section  3.3).  If  we  let  H(T)  (f )  -  xj^e-2*1* ^(t) 
be  the  finite  Fourier  transform  of  h(t),  then,  for  Af  very  small,  it 
is  easy  to  show  that  H^(f)  «  (Af)sin(rfT)/sin(irf) ,  which,  as  a  function 
of  f,  is  concentrated  around  f  -  0  with  maximum  value  TAf,  and  fluctuates 
in  sign  for  0  <  f  <,  see  Brillinger  (1975,  p.  51). 

Consider  now  the  statistic 

a^T)(f)  -  |AA<T)(f)|/Af  (2.19) 

as  an  estimator  of  the  (real-valued)  function 

a^CO  -  |AAx(f)J/Af  ,  (2.20) 

for  0  <  f  <  k.  Vie  call  the  statistic  (2.19)  the  amplitude  density 
function  of  the  values  X(t),  t-0,±l,±2,...,±N.  In  doing  this,  we 
differ  from  Paul  (1972),  who  used  the  same  terminology  for  the  quantity 
(2.17).  As  an  immediate  consequence  of  the  above  theorem,  we  have  the 
following 
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Corollary.  In  the  limit,  as  T  la  a  mean  square  consistent 

estimator  of  ax(f ) ;  that  Is. 

li»Il^E{|a<T)(f)-ax(f)|2}-  0,  0  <  f  <  lj  .  (2.21) 

Proof.  Use  the  fact  that  (mean  square)  convergence  of  a  sequence  of 
complex  numbers  to  a  complex  number  implies  (mean  square)  convergence 
of  the  real  and  Imaginary  parts  of  the  former  to  the  corresponding  real 
and  imaginary  parts  of  the  latter. 

For  practical  applications,  attention  will  usually  center  on  the 

/n*  \ 

user’s  choice  of  Af  %*en  computing  a£  (f) ,  0  <  f  <  %,  in  (2.19). 

Dividing  the  frequency  interval  (0,*j)  into  disjoint  subintervals  each 

of  length  Af  means  that  just  under  l/2Af  points  need  to  be  computed 

(T) 

to  obtain  the  complete  graph  of  a£  '  (f) ,  0  <  f  <  .  Thus,  for  example, 

-4 

if  Af  ■  10  (see  Section  4),  then  we  need  to  compute  approximately 
5000  values.  In  general,  the  choice  of  Af  will  be  tied  to  the  length 
of  the  series,  and  to  the  spectral  distribution  of  the  fc  frequencies. 
Typically,  the  longer  the  series  and  the  larger  the  number  of  discrete 
frequencies  to  be  recovered,  the  smaller  the  value  of  Af  that  should 
be  used.  One  procedure  that  works  quite  well  if  the  series  is  long  and 
if  little  is  known  concerning  its  frequency  structure  is  to  use  a  sequence 
of  decreasing  values  of  Af  which  provide,  in  turn,  an  increasing  degree 
of  frequency  resolution  and  smoothness  of  the  graph  of  a£  (f),  0  <  f  <  % 

_4 

see  Slddiqul  and  Izenman  (1981).  Generally,  a  choice  of  Af  "  10  seems 
to  give  useful  results  for  a  wide  variety  of  series  lengths;  see  the 
discussion  in  Section  4.3  of  this  paper.  For  especially  long  series 
lengths,  Af  -  L0  may  be  preferred. 
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2.2.  Frequegcy  Resolution  Properties 

We  next  address  ourselves  co  che  frequency  resolution  properties  of 
the  amplitude  density  function  (2.19).  Let  Af  be  fixed,  but  smaller 
than  mln1<B^n<k(|f,tt-fn|).  Since,  as  a  function  of  f,  a^Cf)  in 
(2.20)  possesses  local  maxima  at  the  k  points  f-f^ ,  j-1,2, . . . ,k,  the 
locations  of  the  k  largest  relative  maxima  of  ^(f)  (out  of,  say, 
the  m*  £  k  local  maxima  actually  obtained)  provide  us  with  appropriate 
estimates  of  the  f ^ ,  J*l,2,  ...,k.  The  resolution  properties  of  a^^(f) 
then  follow  from  the  fact  that 

[a£T)(f)]2  <  !z"_Me“2,rlftX(t)|2  ,  (2.22) 

and  from  published  results  on  the  asymptotic  properties  of  periodogram 
estimates  of  the  f ^ ,  j-1,2 . k,  derived  under  a  variety  of  assump¬ 

tions  on  the  error  series.  Within  the  context  of  the  present  paper,  we 
consider  the  following  assumption. 

(A)  The  series  e(t),  t*0,±l,±2, . . . ,  in  (2.1)  is  assumed  to  be 
a  sequence  of  independently  and  Identically  distributed  (i.i.d.)  random 
variables  (a  pure  noise  series)  each  with  mean  zero  and  finite  variance ; 
that  is,  in  (2.3)  we  set  c(u)  ■  0  for  u  t  0,  while  (2.4)  becomes 
c(0)  <  •».  See  Whittle  (1952),  Walker  (1971),  and  more  recently  Damsleth 
and  Spjtftvoll  (1982). 

In  the  following,  we  write  f  “  ^i'*2*‘*‘**k^  and 
f(T)  -  (f<T),f<T),...,f£0),  where  the  f^,  j-l,2,...,k,  are  the 
k  largest  local  maxima  of  a^^  (f ) ,  0  <  f  <  *S ,  and  set 
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(f)  -  I 


J-l 


(2.23) 


Then,  subject  to  an  appropriate  separation  condition  imposed  on  the 
J-1,2 , . . . ,k,  such  that 


t 


ltaT~  *1“l<.^<k<Ilf.-f.l>  '  "  <*•**> 

(which  is  needed  to  ensure  that  two  different  frequency  estimates  do  not 

converge  in  probability  to  the  same  value),  $£^(f)  in  (2.23)  is 
(T) 

maximized  when  f-f  .  We  shall  also  write  f,  _,  a  _,  and  3  _ 

”  *  j»°  j»°  j 

for  the  true  values  of  f ^ ,  c^,  and  3^  respectively,  j-1,2, . . .  ,k, 
and  without  loss  of  generality,  we  set  -  0. 

Theorem  2.  Let  X(t),  t«0,±l,±2,.. .,  be  a  time  series  satisfying  (2.1) 

and  (2.2),  where  the  error  series  c(t)  satisfies  assumption  (A),  and 

(2.24)  holds.  Then,  f{T)-f,  «  on(T-1).  j-1.2.....ki  that  is.  fJT) 

converges  in  probability  to  f.  M  as  T  + 

j 

Proof ,  It  suffices  to  consider  the  function 

^T)(f)  -  zj-1  bJT>(fJ)  ,  (2.25) 

where  b<T)(f)  -  (&f)2[aJT)(f)  J2,  0  <  f  <  %.  Set  Tt#0  » 

£*■1,2 . k,  and  H(T)(u)  -  zj._s  h(t)e~2iriut,  0  <  u  <  %.  Then, 

following  the  methods  of  Walker  (1971),  we  have  that 
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- 


_  -2wif.t  v  2rif e  t  -2»lf  .  -t 

“  !£._*  W1)*  *  +Y1.0e  * 

-  iii^i.oH(T)^r£i.o)+7i.oH<T)(fi+£i.o>)  + 


v  -2»l£.t  2 

+  iJL.g  3  e(t)| 


.  -  -2*lf .  t  -  B  -2wlf,t 

♦  llJ.-Nh(t)e  i  e(t)l2  +  2«e{aJBi_Hh(t)e  J  e(t))  x 


«  'Itl<Yt.o',<T)(fj-£l.O)+Yt.OH<T>«J«l,0»l)  •  (2.26) 

Since  Che  variance*  of  Che  real  and  Imaginary  pares  of 
_N  -2wif.C 

Z^__Bh(c)e  J  e(c)  are  each  respectively  4c(0) (Af  )  T+  0(1) ,  Chen 

*  -2¥if  C  . 

V- N  h(t)*  3  e(t>  "  °p(T^  * 

There  are  k2  differences  In  (2.26)  of  Che  for*  f^-f^  0>  1^1,2,..., k, 

J»l,2,...,k;  however,  according  Co  Che  separaclon  condition  (2.24),  only 
k  of  Chase  differences  can  be  0(T~l),  namely,  f^-f^  Q,  J-1,2 . k. 

Sow,  for  small  values  of  u,  Che  funcelon  |H^(u)|2  ■  (Af ) 2sln2 (vuT) / sin2 (wi) 

2 

decreases  monoConically  from  iCs  absolute  maximum  of  (approximately)  (AfT) 

Co  a  ■<»<■«■  of  sero  at  u ■  T*"^ .  Bence,  if  f^-f^  q  •■•11  enough. 


(2.27) 


k£T*(fj)  i»  dominated  by  the  tern  0)|2J  *“  fact» 

‘f'S.o’  -  W3.0lJtt')V  +  0p<iJ«)  . 

and  If  (S^)  Is  a  sequence  of  sets  in  f-space  for  which  (2.24)  holds, 
then. 


(2.28) 


If  we  now  let 


*,(6)  -  (f:  Ifj-fj^M/T.  3-1,2 . k}  , 

and  [Rj(6)]c  -  S^-Rptd)  be  the  coapleaest  of  k^(6)  with  respect  to 
S^,  then,  for  sufficiently  snail  4, 

p  lin__  [T‘2sup  {^T)(f))) 

fclRjCfi)]®  X  ' 

*  »  ltaK-  IT'2  ’?-l<“?.0^.0’“»£t(v{)J,tl«<I)«1-'1.0>|2» 

-  (Af)2  EJ.i<°j,o+®3,0>{p  1S-«[,ln2<ir5)/T2*ln2<ir4/T)3} 

<  Wf)1 

■  »  '-tv  1t"24T)<£o)I  • 


-16- 


1"h*r*  lo  “  ^i,O**2,0**‘***k,O^*  H#Be* 

» - 0  • 


and  th*  thtom  follows. 

Thsso  rosulcs  have  boon  cboekod  by  simulating  o  nunbor  of  series 
consisting  of  cosinusoids  with  discrete  frequencies  fairly  close  to 
each  other,  with  varying  relative  amplitudes,  and  with  and  without  an 
additional  l.i.d.  Gaussian  error  component  having  a  standard  deviation 
the  else  of  the  largest  amplitude.  He  can  report  that  In  ovary  case 
so  far  considered,  all  frequencies  In  each  of  the  generated  series  were 
recovered  with  an  error  of  about  1/10T  by  ualng  the  approach  outlined  in 
this  paper.  Specific  details  of  these  simulations  will  appear  elsewhere . 

For  the  case  In  which  the  error  series  e(t)  la  not  an  l.i.d. 
sequence  of  random  variables,  results  have  bean  obtained  for  periodograa 
estimates  of  frequency  by  Hannan  C1971,  1973)  and  by  Walker  (1973) . 
Hannan,  In  his  1971  paper,  assumes  e(t)  to  be  a  linear  process;  that 
Is, 


e(t)  -  Iu  g(u)£(t-u) ,  Iulg(u)|  <-  ,  (2 .29) 

where  the  £(t)  fora  a  pure  noise  series  with  sero  mean  and  unit  variance 
and  possess  a  continuous  spectral  density.  (Hannan  refers  to  this  as 
Condition  A.)  The  error  series  In  (2.29)  Is  strictly  stationary  with 
finite  second  moments.  Walker  (1973)  expands  the  fora  of  (2.29)  slightly 
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by  assuming  that  the  5(t)  are  aleo  Gaussian  with  zero  mean  finite 
variance,  and  that  the  coefficients  are  known  functions,  g(u,-)»  of 
an  unknown  parameter  vector  6.  Hannan  (1973)  makes  a  more  general 
assumption  that  e(t)  la  a  strictly  stationary  ergodlc  process  with 
■sen  zero  and  finite  variance,  and  then  obtains  an  almost  sure  conver¬ 
gence  result  for  the  perlodogram  estimator  of  a  single  (k>l)  cosinusoidal 
component.  Such  a  result  may  be  extended  to  apply  to  the  methods  of 
paper  through  the  relationship  (2.22)  above. 


3.  sagger  SEARCH  PROCEDURES 


(r\ 

The  graph  of  the  amplitude  density  function,  a^  (f),  0  <  f  <  «j, 
will  typically  display  a  fairly  large  number  of  local  peaks,  say,  a*. 
The  neat  logical  step,  therefore,  is  to  separate  out  the  k  signifi¬ 
cant  peaks  (k  <  m*)  from  those  that  more  properly  represent  the 
noise  level  of  the  time  series  in  question.  Tests  for  Identifying  a 
dominant  set  of  periodicities  have  traditionally  been  based  on  the 
largest  few  ordinates  of  the  graph  of  the  perlodogram  of  the  series; 
see  Fisher  (1929,  1940),  Bartley  (1949),  Siegel  (1980),  end  Section 
6.1.4  of  Priestley  (1981).  The  problem  is  also  related  to  the 
"bump-hunting"  problem  of  Good  and  Gaskins  (1980). 

For  series  in  which  multiple  frequencies  are  present,  a  stepwise 
search  procedure  is  outlined  on  p.  34  of  Walker  (1971);  there,  it  is 
suggested  that  one  frequency  at  a  time  be  estimated  by  examining  the 
largest  perlodogram  ordinate  of  the  residual  series  after  fitting  all 
previously  determined  frequency  components.  Versions  of  Walker's 
suggestion  have  appeared  in  recent  articles  by  Hill  (1977)  and  by 
Demsleth  and  Spjtftvoll  (1982).  Hill,  on  the  one  hand,  in  what  must 
have  amounted  to  a  veritable  tour  de  force,  claims  to  have  used  a 
computational  technique  involving  over  100  Iterations  to  estimate  each 
of  forty  selected  frequencies  (one  at  a  time)  in  the  series  of  monthly 
sunspot  numbers  (the  actual  Iteration  and  search  techniques  are  not 
explained  in  the  paper,  however).  On  the  other  band,  Demsleth  and 
Spjdtvoll,  also  using  essentially  Walker's  stepwise  algorithm,  end 
up  by  selecting  only  eight  frequencies  in  the  annual  sunspot  number 
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serles.  A  search  procedure  for  the  sunspot  series  which  results  In  a 
reasonable  compromise  between  two  such  extremes,  therefore,  appears  to 
be  a  worthwhile  goal. 

In  Slddlqul  and  Izenman  (1981) ,  a  two-stage  search  procedure  is 
described  to  estimate  k  and  determine  those  frequencies 

f^.f^ . f£T^  which  are  prominent  In  the  series  X(t);  the  first 

stage  consists  of  a  straightforward  adaptation  of  Fisher's  well-known 
test  for  periodicities,  while  the  second  stage  uses  information  about 
the  error  spectrum  to  make  the  final  choice.  Here,  we  review  that  method 
and  compare  it  with  an  adaptation  of  Hartley's  (1949)  analysis  of  variance 
based  method  using  a  maximum-F  ratio  statistic  for  fitting  one  frequency 
at  a  time.  We  have  found,  empirically,  for  a  number  of  series,  that  the 
latter  method  yields  results  that  are  very  close  to  those  obtained  by 
the  Siddiqul-Izenman  procedure,  yet  possesses  the  attractive  feature  that 
It  is  semi-automatic  and  can  be  carried  out  using  only  a  least-squares 
regression  program.  Simulation  studies  of  both  search  procedures  for  a 
variety  of  situations  are  clearly  needed  for  future  guidance. 

3.1.  The  Siddiqul-Izenman  Procedure 

(T) 

Preliminary  search.  For  the  Jth  frequency,  »  identified  as 

one  of  the  m*  local  peaks  in  the  amplitude  density  function,  compute 
the  following  quantities: 


ajT)  "  |  I^1(X(t)-X)cos(2irfjT)t), 

bjT>  "  T  ^„1(X(t)-X)sin(2sf^T)t), 


(3.1) 

(3.2) 
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and 

Ccjx>32  -  t.^32  +  [bjT> 32.  0.3) 

where  X  ■*  T  '^.i  «')•  for  j-l,2,...,a*,  and  order  the  f^X\  j»l,2,...,i 
according  to  the  relative  magnitude*  of  their  [cj^  ]2,  j-1,2, . . .  ,w*.  thus, 
if 


Ug>l2  >  l«gjj2  >  *  -  >  UCT> 


(■*) 


O.*) 


are  the  ordered  valuea  of  (3.3),  this,  in  turn,  orders  the  frequencies  in 

a  corresponding  fashion  as  f^,...,  *(a*)*  Kort»  ®Pply  Fisher's 

test  for  periodicity  to  these  a*  ordered  frequencies.  This  test 

(Fisher  1929)  is  designed  to  test  H^:  W)  against  H^:  k  >  1,  where 

k  is  the  true  number  of  discrete  frequencies  in  the  process  under  review. 

(T)  2 

Although  the  test  requires  (c^  ]  to  be  computed  for  each  of  the  H 

Fourier  frequencies  f^  ■  r/T,  r»l,2,...,H,  there  exists  an  integer 
r j ,  between  1  and  X,  such  that 


(3.5) 


and  so,  for  large  T,  the  value  of  [c{X)32  in  (3.3)  should  be  fairly  close 
(T)  2 

to  the  value  of  [c^  3  corresponding  to  r^/T.  Thus,  the  generalisation 
of  Fisher's  test  (Stevens  1939;  Fisher  1940)  will  be  approxiaately  valid 
for  each  of  the  a*  normalized  versions  of  (3.4) 


Cc(j)]2/2,T*  j-1'2* 


(3.6) 
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2 

where  is  ths  total  varlsnes  of  th«  observed  series.  Sst  a  >0.05 
or  0.01,  and  define  the  percentile  g.  by  the  relation 


-<.«>  >  «,>T>a)  -  «  . 


(3.7) 


Values  of  gj>Tf0  have  been  tabulated  by  Shlashoni  (1971)  for  c^O.Ol 

and  0.05,  and  for  (i)  K- 5(5)50,  J-1,2,5,7,10,  and  (ii)  H- 100 

(100)3000,  J-l, 2, 5,10,25, 50.  Sow,  compare  the  value  of  [c^]2  with 

2 

the  appropriate  valua  of  2e^g^  ^  Q,  and  carry  out  thla  comparison 

according, to  the  following  recipe  suggested  by  Shlmshoal.  First,  take 

the  value  of  8^  T  a  <od  pass  all  frequencies  whose  [e^2^]2  values 

are  greater  than  28^8ijj>a’  '***  the  Jj**  frequency  be  the  first  one 

to  fail  this  test  (J.  >1).  Sort,  taka  the  value  of  g.  and 

A 

2  1 

pass  all  further  frequencies  whose  [c  ')  values  are  greater  than 

2  Vk 

2sT*j  j  a-  When  a  frequency  fails  this  second  test,  say,  the  J2 
frequency,  take  the  value  of  g,  _  and  pass  all  frequencies  for 

jMfTpd 

(T)  2  4  2 

which  their  [c  J  values  are  greater  than  Zs^g^  r  And  so  on. 

2  2*  * 

The  process  ends  when  a  new  value  of  2s^g  is  greater  than  the  value 
(T)  2 

of  [c  ]  it  was  supposed  to  test.  All  remaining  frequencies  are 


then  failed  at  this  level  of  significance.  The  number  of  frequencies 
that  are  retained  (out  of  the  original  n*)  is  denoted  by  m^.  In  certain 
situations,  major  frequencies  mey  be  hidden  by  virtue  of  their  close 
proximity  to  more  powerful  frequencies.  The  next  step  is  to  screen  the 
residual  series 


c(T)(t)  -  X(t)  -  |i{T)(t),  t-1,2 . T, 


(3.8) 
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m 

where  ^  (t)  is  the  estiaated  regression  function  consisting  of  the 
above  frequency  components;  the  2m^+l  coefficients*  Oq  and 
(a^,  6 j ) ,  J-l,2*...,m^,  in  the  regression  function 

Pj(t)  -  oQ  +  Ij^1{OjCos(2irf ^t)  +  8jSin(2*f^jt)}  (3.9) 

are  estiaated  by  least-squares*  using  any  standard  regression  computer 
program.  Screening  is  done  by  computing  the  amplitude  density  function 
of  (3.8)*  and  repeating  all  the  above  tests.  If  *2  denotes  the  number 
of  frequencies  declared  significant  at  this  second-step*  .then 
is  the  total  number  of  significant  frequencies  detected. 

Final  Search.  Estimate  the  2iH-l  coefficients,  and  (a^,  B^), 

j-1, 2 *...*»,  in  the  regression  function 

y(t)  -  oQ  +  E®-1{ojcos(2*f2)t)  +  PjSinawf^t)},  t-l,2....,X, 

(3.10) 

(T)  (X)  (T) 

by  least-squares,  denote  the  coefficient  estimates  by  Oq  and  (a^  ,  '), 

j"i,2,...,a,  respectively,  and  let 

-  {C«jT)]2  +  C8jT)]2)\  j-i,2,...,m.  (3.11) 

then,  asymptotically,  for  large  X  (Crenander  and  Rosenblatt  1957*  p.  246), 
the  approximate  covariances  of  (3.11)  are 

cov{|y£T)|.  WjT) | )  -  «{i-3}scCf(J))/T.  i.J-1,2,...,*, 

(3.12) 

where  «{<*}  Is  the  Kronecker  delta.  If  g^(f)  is  a  suitable  estimator  of 
ge(f),  snd  if  g<T)(f)  <  g<T)(f*)  tor  some 


fA,  then  the  bound 


-23- 


var{|TjT>|)  <  gJT>(f*)/T.  (3.13) 

yields  an  approximate,  but  conservative,  95  percent  confidence 
Interval  on  "  (a^  +  namely, 

lyj^l  ±  2{g^T)(f#)/t}Jl.  (3.14) 

As  long  as  any  of  the  a  frequencies  has  an  associated  confidence  Interval 
(3.14)  which  does  not  contain  the  value  zero,  then  that  frequency  Is 

declared  significant  and  Is  retained  In  the  final  model.  Otherwise,  the 
frequency  Is  dropped  from  further  consideration.  The  value  k  la  then 
taken  to  be  the  number  of  significant  frequencies  from  this  final 
selection  stage. 

3.2.  An  F-Dlrected  Search  Procedure 

Another  significance  test  for  periodicities  was  proposed  by  Hartley 

(1949),  who  used  a  least-squares  regression  approach  to  finding  a  different 

(T)  2 

normalization  for  the  [c^j]  values  In  (3.4)  than  was  used  In  (3.6). 
Since  Hartley's  procedure  Is  not  as  well  known  as  that  of  Fisher,  we 
shall  first  review  Its  salient  points  for  the  periodogram  case. 

From  the  N  Fourier  frequencies  available,  nominate  m  of  them  to 
be  potentially  significant.  Denote  these  by  r^/T,  J  -  l,2,...,m,  where 
r^  Is  an  Integer  between  1  and  H.  Next,  set 

X(t)  -  oQ  +  (a^ cos (2irr j t/T)  +  3^sin(2frr^t/T))  +  e(t)  ,  (3.15) 

where  e(t)  Is  Gaussian  and  satisfies  assumption  (A)  above,  and  estimate 
(OjtfJj)*  Jal,2,...,a,  by  least-squares.  The  least-squares  estimates  are 


(T) 

given  by  (a^  ,b^  '),  j-l,2,...,m,  respectively  (see  (3.1),  (3.2), 

(T1 

where  f j  1  ■  r^/T,  j-l,2(...,s),  end  the  fitted  regression  function 
by 

X(T)(t)  -  X  +  iJ.jU*1* 008(2^ t/T)  +  b<T) 810(2^ t/T)}.  (3.16) 

By  virtue  of  the  orthogonality  of  the  trlgonooetric  functions  in  (3.16), 
the  regression  sum  of  squares  (SS  )  can  be  expressed  as  the  sum  of  m 
unco rr elated  sums  of  squares;  that  is, 

SSreg  -  J*ml  (X(T)(t)-X)2  -  |  Y^ml  [cj^]2  (3.17) 

(see  (3.3)),  where,  under  the  hypothesis  that  *  -  0  in  (3.15),  each 
T  fTl  2 

j  [Cj  ]  in  (3.17)  has  an  Independent  chi-squared  distribution  with 
two  degrees  of  freedom  (d.f .) .  The  residual  sum  of  squares  (RSS  )  from 
the  m-component  regression  (3.14),  namely, 

HSSb  -  Zj-i  U(t)-XCT)(t))2  ,  (3.18) 

has  an  Independent  chi-squared  distribution  with  T-2m-l  d.f.  Hartley’s 
test  statistic  for  individual  (Fourier)  frequencies  is,  then 

F(j)  -  (f  tc^]2/RSSB)(T-2m-l)/2  ,  (3.19) 

for  j*l,2,...,m,  where  the  (c^j]  . ^c(mp  *ra  the  a  largest 

values  in  (3.4)  (ef.  (3.6)).  As  an  approximation  to  the  upper  percentage 
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points  of  the  distribution  of  (3.19),  Bartley  suggested  that  the  (Fourier) 

(T)  2 

frequency  corresponding  to  [c^  ^  ]  be  declared  significant  at  the  aZ 

level  if  F^j  j  is  larger  than  the  tabulated  (a/(m-J+l))Z  point  of  the  P 
distribution  with  2  and  T-2a-l  d.f.,  J-l,2,...,m.  The  value  of  k 
is  then  taken  to  be  that  number  of  (Fourier)  frequencies  that  are  declared 
significant  by  this  method. 

Typically,  however,  the  a  largest  local  peaks  In  the  graph  of  the 

amplitude  density  function  will  not  be  located  at  Fourier  frequencies, 

(T)  2 

and,  consequently,  the  [c^  '  ]  ,  j~l,2,...,m,  will  not  be  uncorrelated. 

We  could,  of  course,  argue,  as  we  did  in  Section  3.1,  that  the  differences 
between  the  frequency  estimates  and  their  nearest  Fourier  frequencies 
will  be  relatively  snail  (see  (3.5)),  so  that,  for  i^j,  [c^]2  and 
[CjT^]2  will  be  approximately  uncorrelated,  and  hence  that  Hartley's 
technique  will  be  approximately  valid.  However,  we  prefer  Instead  —  both 
for  computational  and  for  statistical  reasons  —  to  re-interpret  hartley's 

i 

procedure  In  a  slightly  different  way,  and  then  use  that  particular  formu¬ 
lation  here. 

It  is  not  difficult  to  see  that  for  the  special  case  of  Fourier  fre¬ 
quencies  Hartley’s  method  is  equivalent  to  the  following  multistage  proce¬ 
dure,  and,  by  virtue  of  the  argument  noted  in  the  previous  paragraph,  .» 
approximately  equivalent  for  the  more  general  case.  Since  at  each  stage 
we  carry  out  a  series  of  one— component  regressions,  consideration  of  a 
fairly  large  number  of  frequencies  is  computationally  feasible. 


especially  if  T  is  large.  Furthermore,  because  at  each  stage  we  select 
a  frequency  using  an  F  statistic  formulation,  the  overall  procedure  is 
called  "F-directed"  and  resembles  a  "forward  stepwise"  regression  algorithm. 
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Xn  the  following  discussion  we  attend  to  the  general  case  and  write 

Ct) 

fj  ,  J-1,2 . a,  for  the  a  nominated  frequencies;  if  these 

(Ti 

frequencies  are  Fourier  frequencies,  then  fj  -  r^/T,  j-1,2 ....  ,m, 
for  some  Integer  r^  between  1  and  N. 

Consider,  then,  the  problen  of  selecting  k  frequencies  for  the 
regression  model 

X(t)  -  Oq  +  zJ-1{ajcos(2rf^t)  +  3jsln(21rf^t)}  +  e(t)  ,  (3.20) 

(T)  (T) 

where  f^  Is  the  first  frequency  selected,  f is  the  second  frequency 

selected,  and  so  on,  from  a  set  of  a  potentially  significant  frequencies 

fj  ,  j-1,2,...  ,m,  k  <  a,  and  where  e(t)  Is  Gaussian  and  satisfies 

assumption  (A).  The  selection  rule  Is  the  following. 

1.  At  step  1,  let  be  the  largest  of  the  a  F-ratio  statistics, 

BAX 

Fj^,  j-1,2,..., a,  obtained  as  follows.  Regress  X(t)  on  the  pair  of 

(T)  (T) 

components  (cos(2rfj  t) ,  sln(2vf^  t)) ,  and  obtain  the  regression  sum 
of  squares,  , ,  having  2  d.f.  Then,  compute  the  statistic 

F*1}  -  {SS*2  , /ESS  } (T-2a-l) /2  ,  (3.21) 

j  reg.j  a 

where  ESS^  is  the  residual  sua  of  squares  obtained  by  regressing  X(t) 
on  all  a  components  (cos(2»fj^  t),  sin(27rf  j^t) ,  j-1,2, ... ,a) . 

Clearly,  Fj1^  has  the  F-distrlbution  with  2  and  T-2a-l  d.f.  Set 

F^  -  nax(Fjl>,  j-1,2 . a)  .  (3.22) 


If  F^ll  Is  "significant",  then  declare  the  frequency  corresponding  to 
*1*1  as  being  significant,  and  denote  it  by  f^.  Compute  the  residuals 
from  X(t)  after  fitting  the  regression  function  (3.20)  with  fe-1. 

(2) 

2.  At  step  2,  let  be  the  largest  of  the  m-1  F-ratio  statistics, 

(2) 

fj  »  j-l,2,...,m,  JiKl),  obtained  as  follows.  Regress  the  residuals  from 

step  1  on  the  pair  of  components  (cos(2irf^t),  ain(2*f^t))  ,  and  obtain 

(2) 

the  regression  sum  of  squares,  SS'  '  having  2  d.f.  Then,  compute 
the  statistic 

F*2)  -  {Ssf2  ,/BSS  . }(T-2m+l) /2  .  (3*23) 

j  reg,j  m-i 

where  RSS^  ^  is  the  residual  sum  of  squares  obtained  by  regressing 
the  residuals  from  step  1  on  all  the  remaining  m-1  components 

(cos(2irf*T)t),  sin(2frfjT)t),  J-1,2 . .  J*(l».  As  before,  F*2) 

has  an  F-dlstrlbutlon,  but  with  2  and  (T-2nri*l)  d.f.  Set 


F^  -  max{Fj(2),  J-1,2,... ,m,  J*(l)}  .  (3.24) 

(2) 

If  F  is  "significant",  then  declare  the  frequency  corresponding  to 

MX 

(2)  (T) 

F  as  being  significant,  and  denote  it  by  f  .  Compute  the  residuals 

DAX  V*/ 

from  X(t)  after  fitting  the  regression  function  (3.20)  with  fc-2. 

(T)  (T)  (T) 

3.  In  general,  let  *(i)>f(2)****>*(r)  be  the  r  frequencies  declared 
significant  at  the  conclusion  of  step  r.  At  step  rfl,  let  F*av.  be 
the  largest  of  the  n-r  F-ratio  statistics,  Fj1'*’^,  J-l,2,...,m, 
ji*(l)i(2),...,(r),  as  follows.  Regress  the  residuals  from  step  r  on 
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Cf)  (T) 

the  pair  of  component*  (cos(2vf^  t),  sln(2irf^  t)),  and  obtain  tba 
regression  sua  of  squares .  SS'  \ ,  having  2  d.f.  Than,  compute 

rt8»j 

thm  statistic 

P.(r+1>  -  (Ssf^J/RSS^  }(T-2n+2r-l) /2  ,  (3.25) 

J  >*lij  *“r 

where  BSS^  is  the  residual  sua  of  squares  obtained  by  regressing  the 
residuals  froa  step  r  on  all  the  remaining  m-r  components 
(costfwf^t),  aln(2irf<T)t),  J-l,2,...,a,  3*(1),(2) . (r»  ,  and  set 

F^1}  -  naxfFj0*15, J-1,2 . a,  j*(l).(2) . (r)).  (3.26) 

If  Is  "significant",  then  declare  the  frequency  corresponding  to 

F^^  as  being  significant,  end  denote  it  by  If  is 

"not  significant",  then  terminate  the  selection  procedure.  The  value 
of  k  is  then  taken  to  be  the  number  of  frequencies  dedeared  to  be  signi¬ 
ficant  according  to  the  scheme. 

The  question  of  "significance"  In  the  above  algorithm  relates  to  the  pro¬ 
blem  of  comparing ,  at  any  particular  step,  the  observed  value  of  to 

the  tabulated  upper  percentage  points  of  the  distribution  of  the  largest 
order  statistic  from  a  set  of  correlated  F  variables.  Since,  at  step 
r  (1  £  r  <  a) ,  the  denominators  of  each  of  the  {F<r)}  are  identical, 
results  obtained  for  the  multivariate  F  distribution  (Gupta  and  Sobel 
1962;  Krishna lah  and  Araitage  1965)  would  appear  to  be  applicable. 
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However,  Che  numerators  of  the  ere  a  correlated  set  of  X* 

variables,  which.  In  turn,  leads  to  store  complicated  results  for  the 

distribution  theory.  The  paper  by  Krishnalah  sad  Kao  (1962)  may  be 

useful  here  in  defining  a  "generalized"  multivariate  F  distribution; 

see  Johnson  and  Kotz  (1972,  Section  40.8).  For  the  present  paper,  however, 

we  shall  use  a  Bonferronl-type  approximation  to  the  upper  percentage 

points  of  F  similar  to  that  suggested  by  Bartley  for  the  Independent 
max 

case.  Thus,  at  step  r  (1  <  r  <  m),  declare  F^,  to  be  significant  at 

—  —  max 

the  aX  levctl  if  F^  is  larger  than  the  tabulated  (a/(m-r+l))X  point 
max 

of  the  F  distribution  with  2  and  T-2(m-r+l)-l  d.f.  (We  note  here 
that  alternative  approximations  may  be  obtained  from  the  multivariate 
probability  inequalities  of  Karlin  and  Rlnott  (1980).) 

If  additional  significant  frequencies  are  suspected,  but  which  may 
be  hidden  behind  the  more  powerful  ones,  the  following  procedure  will 
be  appropriate.  First,  test  as  above  for  those  m^  frequencies  which 
are  considered  potentially  significant.  Let  k^  of  them  be  declared 
signific-nt.  Then,  obtain  the  residual  series  by  fitting  all  fc^  fre¬ 
quencies  to  X(t) ,  and  recompute  the  amplitude  density  function  for  the 
residual  series.  If  any  additional  frequencies  appear  to  have  relatively 
large  ordinates  in  the  graph  of  that  function,  add  these,  say  In 

number,  to  the  k^  previously  tested,  to  arrive  at  a  total  of  m^k^-hn^. 
Then,  repeat  the  above  F-dlrected  search  procedure  for  all  m  nominated 
frequencies. 
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4.  ANALYSIS  OF  THE  ANNUAL  SUNSPOT  NUMBERS 

The  techniques  of  this  paper  will  now  be  illustrated  using  the  well- 
known  time  series  of  annual  sunspot  relative  numbers  from  1749  to  1979, 
a  period  of  T  -  231  years.  The  data  are  taken  from  Haldmeler  (1961, 
pp.  20-21)  for  the  years  1749-1960,  and  from  Zzenman  (1981,  Table  1.1) 
for  the  remaining  years. 

The  sunspot  numbers  have  received  a  (well-deserved)  reputation  in 
the  statistical  literature  for  being  one  of  the  most  difficult  series  to 
model  successfully,  primarily  because  of  its  nonstandard  features 
(Bloomfield  1976).  We  emphasise  here  that  the  analysis  presented  in 
this  Section  is  not  claimed  to  be  the  best  for  predicting  sunspot  behavior 
In  the  future;  Indeed,  the  nonCaussian,  nonstationary,  aperiodic,  and  non¬ 
linear  behavior  of  the  series  make  its  fitting  and  prediction  an  extremely 
complex  affair.  However,  it  is  precisely  because  of  these  nonstandard 
features  that  make  the  sunspot  numbers  such  an  Interesting  and  nontrivial 
example  for  demonstrating  the  methodology  of  this  paper. 


4.1.  The  Annual  Sunspot  numbers 

The  graph  of  the  amplitude  density  function  (2.20)  with  Af  -  10  of 
the  mean-corrected  annual  sunspot  numbers  is  displayed  in  Figure  1.  The 
prominent  features  of  that  graph  include  two  clusters  of  large  peaks,  one 
at  the  very  low  frequencies,  and  the  other  at  frequencies  around  the 
so-called  "11-year  sunspot  period".  Minety-elght  local  maxima  were 
located  within  the  frequency  band  0  <, f  The  next  step,  therefore,  is 

to  screen  these  peaks  for  significant  frequencies.  We  shall  use  the  two 
search  procedures  described  in  Section  3. 
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(1)  First,  we  consider  the  search  method  outlined  in  Section  3.1. 

* 

The  total  variance  of  the  series  s^  ■  1583.32826.  If  we  carry  out  the 
procedure  as  described,  we  pass  a  total  of  nine  frequencies,  e  number 
comparable  to  that  obtained  by  Damsleth  and  Spjdtvoll  (1982).  However, 
when  compared  to  results  obtained  by  Hill  (1977),  Rust  and  Kirk  (1977), 
and  Siddiqul  and  Izenman  (1981)  for  the  monthly  sunspot  numbers,  and 
since  It  can  be  argued  that  the  major  frequencies  In  the  «n«n«l  series 
should  be  similar  In  number  and  location  to  those  major  low  frequencies 
in  the  monthly  series,  we  conclude  that,  for  some  reason,  too  few  frequen¬ 
cies  are  being  passed  here.  The  problem  lies  In  the  fact  that  the  annual 
sunspot  numbers  are  computed  from  the  monthly  numbers  by  averaging  In  dis¬ 
joint  blocks  of  length  twelve,  thereby  smoothing  out  the  highest  amplitudes 
of  the  monthly  sunspot  numbers.  An  adjustment  Is  necessary,  therefore  —  in 
a  downwards  direction  —  to  the  entries  in  Shlmshoni's  Tables  to  take  this 
Into  account.  We  propose  to  do  this  here  by  multiplying  Shlmshonl's  cri- 
tlcal  values,  gj  T  Q,  by  a  "compensation  factor",  h^,  say,  0  <  h^  £  1. 
which  we  take  to  be  a  location  parameter  of  the  distribution  of  the  ratio 
of  the  mean  to  the  maximum  In  a  sample  of  size  twelve  from  some  appropriate 
underlying  distribution  defined  on  [0,®) .  In  general,  this  problem  is  non¬ 
trivial  and  remains  open.  A  related  reference  is  the  paper  by  Mailer  and 
Resnick  (1981),  which  considers  the  limiting  behavior  for  large  sample  sizes 
of  the  above  ratio  statistic;  however,  their  results  do  not  seem  to  be  of 
help  to  us. 

Since  the  results  we  need  are  not  currently  available,  we  are  led  to  the 
following  approximation.  From  the  monthly  sunspot  numbers,  we  computed  the 
231  ratios  of  annual  mean  to  "annual  maximum"  (that  is,  the  maximum  monthly 
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sunspot  number  for  that  year) ,  and  than  sat  ■  0.62,  tba  aadian  of  all 

those  valuss.  Tba  histograa  of  tha  annual  ratios  is  displayed  la  Figure  2. 
2  2 

Thus,  X00  o  05  "  This  critical  value  paaaes  the 

following  eight  frequencies: 


J 

A T> 
*(J) 

0(T) 

P(J) 

Cc(T)]2 

U(J)J 

1 

.0902 

11.087 

780.7 

2 

.1005 

9.950 

616.3 

3 

.0107 

93.346 

343.7 

4 

.0945 

10.582 

341.3 

5 

.1062 

9.416 

246.7 

6 

.0172 

58.140 

173.0 

7 

.0835 

11.976 

159.5 

8 

.0039 

256.410 

118.7 

where  p^j  Is  the  period  (In  years)  corresponding  to  that  is,  p£jj  • 

1/f  ‘  jj.  Now,  j^-9,  and,  by  Interpolation  in  Shlashonl's  Table,  we  get  that 

*9,100,0.05  "  °-02902*  whenc*  2®Th 12*9 , 100 ,0.05  "  35‘32*  ^  8econd 

critical  value  passes  eight  further  frequencies,  naaely. 


1 

fW 

(T) 

P(J) 

Cc(T)]2 

1  <i>3 

9 

.1168 

8.562 

76.3 

10 

.0772 

12.953 

71.7 

11 

.0232 

4.698 

66.1 

12 

.1115 

8.969 

54.7 

13 

.0716 

13.966 

54.5 

14 

.1233 

8.110 

48.8 

15 

.0473 

21.142 

46.6 

16 

.0668 

14.970 

43.0 

At  the  next  stage,  jj"17*  80  that  *17,100,0.05  "  0.02177,  and  2*xhi2*i7tioo, 

(Tl  2 

-  26.50.  The  next  largest  value  of  [c  ]  is  xx.x,  corresponding  to  the 
.xxxx;  however,  this  value  is  too  low  to  be  passed  by  this  test. 


.05 


frequency 


<33 


The  value  of  m^  la,  therefore,  16.  A  computation  of  the  resulting 
sixteen-component  regression  function,  and  of  the  amplitude  density  function 
of  the  residual  series,  revealed  an  additional  frequency  Which  is  passed 
by  the  above  test,  namely. 


,<T> 

11L 


_<T> 

lUL 


re(T)i2 

ifmL 


17 


.1199 


8.340 


46.1 


so  that  102*1,  and  hence  of17. 

Now,  we  make  a  final  selection  from  these  seventeen  frequencies.  After 
computing  the  fitted  regression  function  using  these  seventeen  frequency 
components,  we  examined  the  autocorrelation  and  partial  autocorrelation 
functions  of  the  residual  series,  and  determined  that  the  best,  and  moat 
parsimonious  fit  to  that  latter  series  would  be  a  second-order  autoregressive 
process  of  the  form 

e^Ct)  -  41c(T)(t-l)  +  ♦2«(T)(t-2)  +  «(t), 

9 

where  6(t)  is  a  white  noise  process  with  mean  zero  and  variance  <r^.  The 
maximum-likelihood  estimates  and  associated  standard  errors  of  the  two 
parameters  in  the  model  are: 

parameter  MLE _ SE 

il  .6736  .0632 

+2  -.3722  .0662 


2 

6 


146.4.  The  roots  of  the  corresponding  characteristic  equation 
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(T) 

are  both  cooplex,  indicating  pseudo-periodic  behavior  of  the  e  (t)- 
process  (Box  and  Jenkins  1976 ,  pp.  58-63).  The  estimated  spectral 
density  function  of  the  residual  series  Is,  therefore,  given  by 

g<T)(f)  -  2o2  /{(l45+42)~2*l(l"*2)c08<2*f)“2*2c08(4irf)> 

-292.8/(1. 5923-1 .8486cos (2»f)+0 . 7444cos (4vf ) ) , 

(T) 

Where  0  <,  f  <  >j.  The  maximum  of  (f )  occurs  at  f  -  fA,  where 

cos(2rf*)  -  61(l-42)/(-492)  "  °-6208; 

that  is,  f*  -  0.1434.  Thus.  g(T)(f)  <a  <T)(f*)  -  1068.2325.  so  that 
var{ | | }  <  1068.2325/231  -  4.6244.  Conservative  95  percent  confidence 
limits  on  |r^(  are  given  by  |tjT^|  *  4.3009.  Examination  of  the  seventeen 
values  of  Iy^^I  showed  that  these  limits  passed  fourteen  out  of  the  seven¬ 
teen  frequencies  passed  at  the  preliminary  screening;  the  three  frequencies 
that  were  eliminated  are  .1168,  .0716,  and  .1115.  Table  1  shows  the  final 
fitted  regression  coefficient  estimates  for  the  fourteen-component  function, 
and  Figure  3  gives  a  visual  display  of  the  observed  series  of  annual 
sunspot  numbers  and  the  fitted  regression  function.  The  amount  of  total 
variance  explained  by  the  fitted  function  is  85.4  percent. 

(2)  Next,  we  consider  the  F-dlrected  search  method  as  described  In 
Section  3.2.  Eighteen  frequencies  were  nominated  as  potentially  signifi¬ 
cant,  which  were  also  the  largest  eighteen  peaks  In  the  graph  of  the 
amplitude  density  function  (see  Figure  1).  The  steps  In  the  search  proce¬ 
dure  are  set  out  In  Table  2,  where  we  have  taken  a  -  1.0  and  m  -  18. 

At  step  r,  we  compare  the  value  of  to  the  (l/(19-r))Z  point  of 
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Che  F  distribution  with  2  and  192 +  2r  d.f.f  and  note  that 


F$*1X  -  6.91 


.0.5* 


•  *  -  5.30, 
»• 


-1.0*  x  xi 

FT  •  4.61 

2,m 


(4.1) 


Thus,  15  out  of  the  18  nominated  frequencies  were  found  significant  by 
this  aethod,  and  the  amount  of  total  variance  explained  by  the  fitted 
function  is  86.4  percent. 

To  summarize  the  above  results,  we  note  the  following.  First,  we 
were  able  to  separate  the  two  major  frequencies  .0902  (11.087  years)  and 
.0945  (10.582  years)  from  each  other  which  are  less  than  1/T  apart. 

Second,  although  perlodogram  analysis  of  the  annual  sunspot  numbers 
exhibits  a  "broad  peak  of  indeterminate  width"  extending  from  the 
frequency  .0625  to  the  frequency  .1250  (Bloomfield  1976,  p.  96),  we  have 
shown  that  there  are  actually  nine  quite  distinct  and  significant 
frequencies  inside  that  band.  We  also  identified  a  new  major  frequency  in 
the  series,  .0039  or  256.41  years,  which  should  be  taken  with  some  caution 
since  its  length  exceeds  the  length  of  the  series  Itself.  However,  it 
does  suggest  that  the  resolution  of  this  technique  is  high  enough  to  enable 
it  to  Identify  cycles  from  series  whose  length  is  less  than  the  period  of 
that  cycle. 


4.2.  The  Signed  Annual  Sunspot  Numbers 

Since  the  sunspot  numbers  are,  by  definition,  nonnegative,  this 
introduces  an  implicit,  but  major,  restriction  onto  the  form  of  the  fitting 
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function.  Any  unconstrained  model  that  attempts  to  follow  closely  the 
course  of  the  sunspot  numbers  over  time  will,  therefore,  be  estimated  by 
a  function  that  will  not  necessarily  be  nonnegative  at  each  of  its  minima. 
Such  a  feature  may  be  seen,  for  example,  in  Figure  3,  where  certain  very 
low  minima  are  fitted  by  negative  numbers.  For  many  similar  series,  little 
or  nothing  can  be  done  Co  alleviate  this  problem  without  making  the  model 
correspondingly  more  complicated.  However,  we  are  fortunate  In  that  a 
very  natural  and  physically  meaningful  data  transformation  Is  available  to 
us  which  eliminates  this  problem  and,  furthermore,  dramatically  improves 
the  performance  of  the  model. 

In  1908,  G.  E.  Hale  discovered  the  existence  of  the  solar  magnetic 
cycle,  in  which  alternate  11-year  sunspot  cycles  are  accompanied  by  a 
reversal  in  polarity  of  the  sun's  magnetic  field  (Bray  and  Loughhead  1964). 
Loosely  speaking,  cycles  having  'positive*  magnetic  polarity  are  Invariably 
succeeded  by  cycles  having  'negative'  polarity,  and  vice-versa.  These 
results  led  Hale  in  192S  to  conclude  that  a  "complete"  solar  cycle  was 
actually  of  the  order  of  22  years,  from  the  start  of  a  positive  cycle 
through  to  the  start  of  the  next  positive  cycle.  This  fact,  now  well- 
established  in  the  scientific  literature,  led  Bracewell  (1953)  in  a  short 
letter  to  the  Journal  Nature  to  suggest  that  perhaps  sunspot  numbers 
corresponding  to  years  of  negative  magnetic  cycles  be  assigned  a  minus 
sign,  while  sunspot  numbers  associated  with  years  of  positive  cycles  be 
unchanged.  The  result  of  such  a  transformation  is  a  reflection  through 
the  time  axis  of  every  other  11-year  cycle  of  sunspot  numbers.  Thus,  the 
series  would  no  longer  be  constrained  in  the  above  sense.  Now,  minima 
would  be  represented  by  the  sero-crosslng  points  of  the  graph  of  the  series 
through  the  time  axis,  negative  numbers  would  possess  a  natural  physical 
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interpretation,  and  maxima  would  be  given  by  the  sequence  of  alternating 
positive  and  negative  extrema  of  the  transformed  series.  For  our  purposes, 
it  is  necessary  to  assume  that  the  solar  magnetic  cycle  was  present  for 
at  least  three  hundred  years  prior  to  its  discovery.  It  is  surprizing 
to  note  that  although  Bracevell's  suggestion  appeared  thirty  years  ago, 
we  could  find  only  three  articles  in  which  similar  suggestions  were  put 
forward;  these  were  in  a  brief  reference  to  Bracewell's  letter  by  Moran 
(1954),  a  comment  (but  not  followed  up)  in  Brill inger  and  Rosenblatt  (1967), 
p.  215,  and  in  a  recent  note  in  Mature  by  Hill  (1977). 

A  technical  remark  on  the  use  of  this  transformation  is  in  order  here. 
Since,  in  most  Instances,  a  minimum  occurs  midway  through  a  year,  the 
question  is  raised  as  to  whether  that  year  be  assigned  a  positive  or  a 
negative  value.  One  way  of  settling  this  is  to  examine  the  series  of 
monthly  sunspot  numbers  for  the  minimum  year  in  question,  determine  a 
month  of  minimum  by  inspection  (which,  in  some  cases,  may  not  be  unique  or 
even  obvious),  assign  plus  and  minus  signs  as  appropriate  up  to  and  then 
following  the  month  of  minimus,  and  finally,  recompute  a  new  annual  mean 
for  that  year.  The  general  effect  of  such  a  recomputed  transition  sunspot 
number  is  to  smooth  the  transition  from  positive  years  to  negative  years 
and  vice-versa.  Fortunately,  it  appears  that  precise  determination  of  the 
month  of  minimum  is  not  crucial  for  a  successful  application  of  the  methods 
of  this  paper,  and  anyway,  in  some  cases,  a  definitive  minimum  month  will 
be  difficult  to  resolve  by  means  other  than  by  guesswork. 

-4 

The  graph  of  the  amplitude  density  function  (2.20)  with  Af  “  10  of 
the  mean-corrected  signed  annual  sunspot  numbers  is  displayed  in  Figure  4. 
Ninety-four  local  maxima  were  located  within  the  frequency  band  0  <  f  <,  *j, 
and  the  most  prominent  feature  visible  in  the  graph  is  that  the  two  clusters 
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o£  frequencies  in  Figure  1  have  now  been  replaced  by  a  single  duster 
approximately  midway  between  them.  The  transformation  appears  to  have 
removed  most  of  the  longer  cycles  in  the  original  series. 

(1)  The  total  variance  of  the  transformed  series  is  sj  ■  4101.5426,  and 
from  Shlmshonl's  Table  2(b),  we  again  have  gj  jq..  q  05  "  0.07378.  Ve  now 
need  to  make  a  similar  argument  as  for  the  original  series  In  Section  4.1 
and  come  up  with  a  value  for  h^  th®  compensation  factor  in  going  from  the 
series  of  signed  monthly  sunspot  numbers  to  the  signed  annual  numbers.  To 
obtain  the  signed  monthly  numbers,  we  would  have  to  go  back  to  the  dally 
sunspot  numbers;  rather  than  do  this,  ve  note  the  following.  For  the 
"positive"  cycles,  the  ratio  of  signed  annual  mean  to  signed  annual  maximum 
for  each  year  will  be  the  same  as  those  for  the  original  series,  and  for  the 
"negative”  cycles,  the  ratio  of  signed  annual  mean  to  signed  annual  maximum 
will  be  equal  to  the  ratio  of  (original)  annual  mean  to  (original)  annual 
minimum,  which,  in  turn,  will  be  greater  than  or  equal  to  the  ratio  of 
(original)  annual  mean  to  (original)  annual  maximum.  Bence,  from  the 
results  of  Section  4.1,  we  can  assume,  for  the  signed  series,  0.62  <_  hj2 
To  allow  ««»•■>•« «■»«■»  flexibility  here,  we  take  hj^«0.62.  Thus,  ^8x^1281, 100, 0.05 

-  232.65,  which  passes  the  following  five  frequencies: 


j 

f(T) 

*(1) 

(T) 

P(l> 

Cc(T))2 

LcmJ 

1 

.0449 

22.272 

5374.0 

2 

.0557 

17.953 

1545.6 

3 

.0505 

19.802 

983.7 

4 

.0611 

16.367 

491.9 

5 

.0388 

25.775 

298.4 

Now,  Jj-6,  and,  by  Interpolation,  we  get  85^00,0  .05  -°  .03504,  whence 
2s^h^2g6  10Q  0  q5  - 110.49.  This  critical  value  passes  two  further  frequencies. 
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namely. 


3 

#(T) 

*«) 

(T) 

P(J> 

rc(T)32 

LC(i)J 

6 

.0343 

29.255 

161.5 

7 

.0269 

37.175 

132.6 

Next,  j2-8,  so  that  gg>100>0 

This  critical  value  passes  a 

,05  -  0.03104,  and  2  100, 0.05  “  97*88' 

single  additional  frequency,  namely. 

i 

f(T) 

f(3) 

<r)  ,  Ct),2 

.  PU>  Cc(1)3 

8 

.0802 

12.469  102.4 

This  time,  j3“9,  89>100>0.05 

passes  the  two  frequencies 

-  0.02902, 

and  2 8x**  1289,100, 0.05  "  91*51’  Vhlch 

i 

f(T) 

f(j)_ 

B(T) 

P(1) 

Cc(T)]Z 

U(1)J 

9 

.1454 

6.878 

95.4 

10 

.1379 

7.252 

94.6 

Now,  JA-11,  811>100>0.05  "  0.02627,  and  2sxh128ll, 100, 0.05  "  82,84*  The 

-  (T)  2 

next  largest  value  of  ]  la  79.4,  which  corresponds  to  the  frequency 
.1508  (or,  6.63  years),  but  which  is  too  small  to  pass  the  test.  Hence, 
nij-lO,  and  a  screening  of  the  residual  series  from  the  ten-component  fit 
gives  n^O,  so  that  m->10. 

As  in  Section  4.1  we  make  a  final  selection  from  these  ten  frequencies. 
The  best,  and  most  parsimonious,  fit  to  the  residual  series  after  fitting 
a  ten-component  regression  function  from  the  above  frequencies  is  a  second- 
order  autoregressive  process  of  the  form  (4.1),  where  the  estimated 
parameters  and  their  standard  errors  are: 
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parameter _ MLE _ SE 

♦j  .8945  .0640 

♦2  -.3523  .0641 

and  or,  «*  183.1.  As  before,  the  roots  of  the  characteristic  equation  are  both 
c 

complex,  indicating  pseudo-periodic  behavior,  and  the  estimated  spectral 
density  function  of  the  residual  series  is  given  by 

g<X>(f)  -  366 . 2/{  1 . 9242-2 . 4 193cos  (2vf  )+0 . 7046cos  (4vf  )  } , 

where  0  <,  f  <,%,  with  a  maximum  value  at  f  ■  f t  -  (2a)  *cos  *(0.8584)  m  0.0857. 
Hence,  g(T)(f)  <  g(T)(f*)  -  2019.38,  so  that  var{|YfX)|)  <  2019.38/231  -  8.74. 

C  6  *  J 

Conservative  95  percent  confidence  limits  on  | Yj  1  are,  therefore,  given  by 
{ Yj 1  ±  5.9133.  These  limits  passed  all  but  one  of  the  frequencies  passed 
at  the  preliminary  stage,  the  frequency  being  eliminated  was  .0343.  Table  3 
lists  the  coefficient  estimates  of  the  nine-component  regression  function, 
and  Figure  5  gives  a  visual  display  of  the  signed  annual  sunspot  numbers  and 
the  fitted  regression  function.  The  amount  of  total  variance  of  the  signed 
series  explained  by  the  fitted  function  is  91.0  percent. 

(2)  For  the  F-dir acted  search  procedure,  fifteen  frequencies  were  nominated 
as  potentially  significant,  which  were  also  the  largest  peaks  in  the  graph  of 
the  amplitude  density  function  (see  Figure  4).  The  steps  In  the  search  pro¬ 
cedure  are  set  out  in  Table  4,  where  we  have  taken  a  -  1.0  and  a  -  15. 

At  step  r,  we  compare  the  value  of  F^  to  the  (l/(16-r))Z  point  of  the 
F  distribution  with  2  and  198+  2r  d.f.  (see (4.1)).  Thus,  12  out  of  the  15 
nominated  frequencies  were  found  significant  by  this  method,  and  the  amount 
of  total  variance  explained  by  the  fitted  function  is  92.6  percent. 
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4.3.  The  Frequency  Trace 

One  way  of  judging  the  extent  of  nonstatlonarlty  In  a  tine  series  is  to 
complex  demodulate  the  series  at  a  specific  frequency  (Bingham,  Godfrey,  and 
Tukey  1967).  When  this  is  done  for  the  annual  sunspot  numbers  at  a  frequency 
corresponding  to  an  11-year  cycle  (Bloomfield  1976,  pp.  137-140),  substantial 
variations  over  time  in  both  amplitude  and  phase  become  lomediately  visible. 
The  techniques  presented  in  this  paper  can  now  be  used  to  complement  that 

approach  so  that  we  can  actually  quantify  the  movement  (if  any)  of  any 
frequency  estimate  over  time.  Such  a  direct  measurement  of  "frequency 
drift"  is  made  possible  only  through  a  high-resolution  frequency 
analysis. 

If  a  process  is  truly  stationary  over  time  with  fixed  periodicities, 
then  by  extending  the  length  of  the  physical  record  for  that  process 
by  concatenating  additional  observations  to  either  end,  the  resulting 
frequency  analysis  should  not  be  radically  effected.  In  other  words, 
we  should  not  expect  the  overall  shape  of  the  graph  of  the  amplitude 
density  function  to  alter  perceptively  by  a  lengthening  of  the  series 
record.  This  argument  suggests  two  distinct  ways  of  obtaining  a 
sequence  of  estimates  of  each  major  frequency  of  the  series.  They 
are  the  following: 

( 1)  Fix  an  initial  reference  time  point  (which  could  be,  for 

example,  the  first  observation),  compute  the  graph  of  the  amplitude 

density  function  for  an  initial  series  length  of  Tj  values  (where 

T.  <  T)  starting  from  that  initial  time  point,  and  thereby  obtain  tbe 
1  <T.) 

estimate  of  f^.  Next,  Increase  the  record  length  to  T2  obser¬ 

vations  by  adding  Tj-T^  more  recent  data  points  to  the  Tj  values 
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•lready  analyzed  (T,  <  T),  and  repeat  all  previous  computations  to 

2  <t2) 

get  a  second  estimate  ft  of  f ^ .  Thus,  by  successively  Increasing 

the  amount  of  data  available  on  the  process  by,  say,  n  steady  increments 

of  T...-T.  observations,  i-1,2,. ..,n,  and  where,  typically,  T  . ,  -  T, 

1+1  i  /_  \  n+i 

(T.) 

ve  can  obtain  a  sequence  of  "forward  estimates"  f ^  ,  i-1,2,. .., n+1, 

of  f^. 

(2)  The  alternative  procedure  is  to  fix  a  final  reference  time 
point  (the  latest  observation,  perhaps) ,  compute  the  amplitude  density 


function  for  a  series  record  of  length  T.  terminating  at  that  final 

1  (T  ) 

time  point  (Tj  <  T),  and  obtain  the  estimate  f^  of  f^.  Now, 

Increase  the  record  length  to  Tj  observations  by  adding  Tj-Tj  more 

ancient  observations  to  the  front  end  of  the  series,  and  obtain  a 
<T  > 

second  estimate  fj  of  f ^ .  In  this  way,  by  increasing  in  a 

systematic  fashion  the  amount  of  data  available  on  the  process,  we 

<T  > 

can  obtain  a  sequence  of  "reverse  estimates"  f^  ,  i-1,2, . . .  ,n+l,  of 
£J,  where  Ti  <  T1+l  and  Vl  '  T- 

In  certain  situations,  such  as  for  the  sunspot  numbers  series,  both 

approaches  may  be  of  independent  Interest. 

(T.) 

Thus,  if  f  ,  i-1,2, . . .,n+l,  denote  the  n+1  estimates  of  f,  as 
3  <T.)  i 

determined  by  either  of  the  two  methods  above,  ve  now  plot  f ^ 

against  T^,  1-1,2 . n+1.  If  the  corresponding  period  p^  -  1/f ^  is 

stable  over  time,  the  resulting  plot  should  reveal  an  approximate 

straight-line  parallel  to  the  horizontal  (time)  axis.  Ve  call  such  a 

(T) 

plot  the  frequency  trace  of  fj  ,  and,  to  distinguish  between  the  two 

types  of  estimates,  we  shall  refer  to  the  trace  as  either  a  forward 

(T) 

frequency  trace  or  as  a  reverse  frequency  trace  of  f ^  according  as 

the  initial  or  the  final  reference  time  point  is  held  fixed.  There  is 
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no  reason  Co  expecc  these  two  Cypes  of  frequency  traces  to  be  Identical. 
Particular  and  persistent  deviations  from  a  straight-line  plot  will 
give  a  quantitative  description  of  the  direction  and  extent  of  any 
nonstationarity  present  in  the  aeries. 

In  Figures  6a,  b,  and  c,  ve  have  plotted  frequency  traces  for  the 

triplet  of  close  frequencies  from  Table  1  around  the  so-called  "11-year 

sunspot  period",  namely,  f^  -  .0902,  f^  -  .1005,  and  f^T*  -  .0835, 

-4 

with  Af  »  10  and  T^-T^  ■  2,  i»l,2,...,n,  for  the  annual  sunspot 
numbers.  These  plots  each  Indicate  nonstationarity  of  the  series.  For 
example,  if  ve  use  all  231  available  sunspot  numbers  for  1749-1979,  the 
most  prominent  frequency  is  estimated  to  be  .0902  (11.087  years);  how¬ 
ever,  the  most  recent,  and  certainly  the  most  reliable,  data  (at  least, 
since  1930)  suggest  that  it  night  be  wiser  to  estimate  it  (say,  for 
prediction  purposes)  as  .0952  (10.504  years).  Apparently,  the  "11-year 
sunspot  cycle"  has  been  getting  systematically  shorter  with  time.  Others 
(such  as  Currie  1980)  have  also  noticed  this  phenomenon,  but  by  using 
different  methods.  He  have  also  plotted  in  Figures  7a,  b,  and  c  the 
frequency  traces  for  the  three  longest  periods  detected  in  the  sunspot 
series  (see  Table  1),  namely,  f^  -  .0107,  f^  -  .0172,  and  f ^  • 
.0039.  The  rapid  shifts  in  the  various  frequency  estimates  over  time 
may  have  implications  for  the  historians'  claims  of  variability  in  the 
quality  of  the  actual  sunspot  numbers  used  to  compile  the  more  ancient 
data  (see  Eddy  1976;  Izenman  1981). 
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Table  1.  Estimate*  of  parameters  for  the  14-coaponeat 
fit  to  the  serisa  of  annual  mean  sunspot  relative  numbers, 
1749-1979.  Frequency  components  are  ordered  by  decreaalag 
magnitudes  of  the 


i 

*r 

•r 

lrf>l 

0 

constant 

49.019 

1 

.0902 

11.087 

25.115 

15.181 

29.347 

2 

.1005 

9.950 

14.894 

12.115 

19.199 

3 

.0107 

93.346 

6.062 

15.177 

16.343 

4 

.0835 

11.976 

-11.246 

10.807 

15.597 

5 

.0945 

10.582 

9.059 

-10.688 

14.011 

6 

.0172 

58.140 

-  7.889 

-  7.246 

10.712 

7 

.1233 

8.110 

3.870 

-  9.957 

10.682 

8 

.1062 

9.416 

-  0.634 

9.448 

9.469 

9 

.1199 

8.340 

1.434 

-  8.911 

9.026 

10 

.0473 

21.142 

5.759 

-  5.117 

7.704 

11 

.0039 

256.410 

4.927 

-  5.171 

7.143 

12 

.0772 

12.953 

-  2.261 

6.008 

6.419 

13 

.0232 

4.698 

4.828 

-  3.610 

6.028 

14 

.0668 

14.970 

-  0.296 

-  5.034 

5.043 
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Table  2.  Summary  of  ANOVA  selection  of  significant  frequencies  for  tha 
tine  series  of  annual  mean  sunspot  relative  numbers,  1749-1979.  Frequen¬ 
cies  vere  entered  into  tha  model  if  their  corresponding  ?n»-*  statistic 
vas  significant  at  the  1Z  level  (see  text.  Section  3.2).  The  total  sum 
of  squares  for  the  series  is  364655  with  230  d.f.,  and  m  was  taken  as  18. 


r 

f(T> 

£<r> 

ss*r^ 

reg 

RSS 

m-r+1 

p(r) 

BiflX 

R2 

1 

.0902 

90439 

45317 

193.58 

25.8Z 

2 

.1005 

59578 

46632 

125.20 

21.7Z 

3 

.0107 

40481 

47818 

83.81 

52.52 

4 

.0835 

31159 

49322 

63.17 

61.42 

5 

.0945 

20040 

49671 

40.75 

67.3Z 

6 

.0172 

14896 

42239 

35.98 

71.6Z 

7 

.1233 

10228 

46359 

22.72 

74. 4Z 

8 

.1062 

9279 

47045 

20.51 

77. 1Z 

9 

.1199 

7895 

46828 

17.70 

79.5Z 

10 

.0473 

6167 

46607 

14.02 

81.22 

11 

.0039 

4254 

46761 

9.74 

82.42 

12 

.0772 

4012 

46411 

9.33 

83.6Z 

13 

.0232 

3774 

46284 

8.88 

84.7Z 

14 

.1833 

3508 

46015 

8.38 

85. 7Z 

15 

.0668 

2693 

45937 

6.50 

86. 4Z 

I 
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Table  3*  Estimate*  of  parameters  for  the  nine-component 

fit  to  the  scries  of  signed  annual  mean  sunspot  relative 

numbers,  1749-1979.  Frequency  components  are  ordered  by 

(T) 

decreasing  magnitudes  of  the  IYj  | . 


3 

f> 

•r 

a(T) 

WjT)l 

0 

constant 

-  1.324 

l 

.0449 

22.272 

68.812 

11.227 

69.722 

2 

.0557 

17.953 

14.650 

23.496 

27.689 

3 

.0388 

25.775 

-16.952 

2.297 

17.107 

4 

.0505 

19.802 

15.209 

1.347 

15.269 

5 

.0411 

16.367 

-  9.392 

7.896 

12.270 

6 

.0802 

12.469 

8.973 

6.045 

10.819 

7 

.0269 

37.175 

3.771 

9.607 

10.321 

8 

.1454 

6.878 

9.154 

1.792 

9.328 

9 

.1379 

7.252 

-  8.067 

3.156 

8.662 
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Table  4.  Summary  of  ANOVA  selection  of  significant  frequencies  for  the 
tine  series  of  signed  annual  aean  sunspot  relative  numbers,  1749-1979. 
Frequencies  were  entered  Into  the  model  if  theire  corresponding  Fm 
statistic  was  significant  at  the  1Z  level  (see,  text.  Section  3.2).  The 
total  sun  of  squares  for  the  series  is  943355  with  230  d.f . ,  and  n  was 
taken  as  15. 


r 

£<t> 

*(r) 

ss*r* 

reg 

RSS 

m-r+1 

F<r> 

max 

R2 

1 

.0449 

626448 

61198 

1023.64 

66.4Z 

2 

.0557 

106372 

70863 

151.61 

77. 9Z 

3 

.0388 

35333 

66386 

54.29 

81. 8Z 

4 

.0505 

24925 

64885 

39.57 

84. 7Z 

5 

.0611 

13641 

62575 

23.43 

86. 2Z 

6 

.0802 

13799 

61830 

22.67 

87. 6Z 

7 

.0269 

11989 

61755 

20.58 

88.9Z 

8 

.1454 

10783 

61640 

18.72 

90. 1Z 

9 

.1379 

8278 

62154 

14.38 

91.0Z 

10 

.1287 

5842 

61765 

10.31 

91.6Z 

11 

.1031 

4652 

61713 

8.29 

92. 1Z 

12 

.1508 

4566 

61667 

8.22 

92. 6Z 

50- 
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PIC.  2.  Histogram  of  the  231  ratios  of  annual  mean  sunspot  number  to 
maximum  monthly  sunspot  nunfcer  that  year  for  the  period  1749-1979. 
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